Garbage in, garbage out: Impact of sequence matching based text cleaning and phrase identification on unsupervised text mining

نویسندگان

  • Arho Suominen
  • Hannes Toivanen
چکیده

In 2006, Daim et al. published the highly cited paper on forecasting emerging technologies with bibliometrics and patent analysis. In the paper, scientific publications and patent were used as a numerical input to for example system dynamic models or scenarios –elaborating on the current state and trend of technological development as a year-to-year indicator value. By forcing the indicator to the well-known growth models the analyst also had an indication of future development. This approach of quantifying instances of publication of patenting is to a significant extent valid in producing a “how much” indicator, but yields far less an indication on the “what” of technological development.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Competitive Intelligence Text Mining: Words Speak

Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intellige...

متن کامل

Question Pre-Processing In A QA System On Internet Discussion Groups

This paper proposes methods to pre-process questions in the postings before a QA system can find answers in a discussion group in the Internet. Pre-processing includes garbage text removal and question segmentation. Garbage keywords are collected and different length thresholds are assigned to them for garbage text identification. Interrogative forms and question types are used to segment quest...

متن کامل

Unsupervised Text Mining for Ontology Extraction: An Evaluation of Statistical Measures

We report on a comparative evaluation carried out in the field of unsupervised text mining. We have worked on a parsed medical corpus, on which we have used different statistical measures. Using those measures, we rate the verb-object dependencies and we select the most reliable ones according to each measure. We then apply pattern matching and clustering algorithms to the classes of dependenci...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Smart Garbage Alert System

Cleaning the dustbin is an important process which has to be done at regular basis but due to the inconsistency in the filling of waste, sometimes it gets overflowed even before the next cleaning process arrives. It is also observed that the garbage gets accumulated due to irregular removal of garbage present in the dustbin. Here we have figured out a new model for the municipal dustbins which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014